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Now and in the future, a face mask is a very important strategy to protect 
people when a new contagious life threatens disease spread through the air 
appears. Currently, there is a serious health emergency because of the 
coronavirus disease 2019 (COVID-19) epidemic. The negative consequences 
of this pandemic need to be protected in public areas. Numerous methods are 
advised by the World Health Organization (WHO) to reduce infection rates 
and prevent depleting the available medical resources in the absence of 


Keywords: efficient antivirals. Wearing masks is a non-pharmaceutical strategy to 
lessen the susceptibility to COVID-19 infection. This research aims to create 
COVID-19 weg wares $ a . 
a a face mask identification system that is efficient and uses deep learning, 
Deep learning which has proven to be beneficial in many real-world applications. This 
Face mask system has also used a transfer learning method with the MobileNetV2 
Image processing model to classify people who wear face masks properly, wear face masks 
MobileNetV2 improperly, and are without masks. The results demonstrate that the 
proposed system has an accuracy of 99.4% which is higher than current 
systems. 
This is an open access article under the CC BY-SA license. 
© BY SA 
Corresponding Author: 
Zeyad Qasim Habeeb 


Department of Biomedical Engineering, University of Technology-Iraq 
Baghdad, Iraq 
Email: zeyad.q.habeeb @uotechnology.edu.iq 


1. INTRODUCTION 

A contagious disease called coronavirus disease 2019 (COVID-19) is still easily transmissible and 
the number of patients who have died or become sick is innumerable [1]. The most frequent symptoms of 
COVID-19 include muscle or body aches, tiredness, diarrhea, vomiting, congested or runny nose, sore throat, 
headache, cough, and fever [2]. No clinically licensed antiviral medication against COVID-19 has been 
reported as of yet. The entire human population is facing significant health, economic, environmental, and 
societal issues as a result of this disease [3]. People cannot isolate themselves from society and stay 
unconnected to the world [4]. It is challenging to manually check people in public places for face masks. 
Therefore, it is necessary to develop automated techniques for identifying face masks [5]. Since the 
COVID-19 pandemic's appearance, The fields of computer vision have made major advancements in face 
mask detection [6]. However, many face mask detection technologies still struggle with limited accuracy or 
detecting improperly worn masks. 

The following are the primary contributions of this research: the MobileNetv2 model, a cutting-edge 
object detector that uses a convolution neural network, is the base of the proposed system for detecting face 
masks. MobileNetv2 has been proven of performing typical object detection tasks, but its evaluation for 
incorrect facemask-wearing detection is insufficient. Second, a transfer learning technique has been applied 
to the MobileNetv2 detector, which enhanced the performance and led to superior outcomes when compared 


Journal homepage: http://beei.org 


Bulletin of Electr Eng & Inf ISSN: 2302-9285 O 2213 


to cutting-edge techniques. Third, the proposed system is capable of detecting improperly worn masks as 
well as properly worn masks and no masks. Fourth, the Haar cascade classifier [2] has been used in this 
research to extract frontal face photos in the masked faces (MAFA) dataset. This classifier has been used 
because the MAFA dataset has noisy data so we need to isolate frontal face images from the noisy photos 
using the Haar wavelet approach. Training the proposed system with unnoisy data can lead to better 
performance. A more formal explanation of our research is given in Figure 1. 

The article is structured as follows: section 2 provides information on earlier studies on face mask 
detection. Section 3 contains information regarding convolutional neural network (CNN) architectures that 
were utilized. Additionally, this part explains how to train and evaluate CNN models for face mask 
recognition as well as transfer learning instructions. In section 4, the datasets, evaluation metrics and results 
of this research are explained. Section 5 outlines the conclusions and recommendations for future work. 


Hypothesis: automatically detecting face masks is 
important for protecting people when a new 
contagious life threatens disease spread through the 


air appears. 


Objectives: Present and build an automatic face 
mask identification system based on image 
processing and the MobileNetv2 model. 


Methodology: Due to the limited accuracy of the 
current face mask detection systems, the 
MobileNetv2 model and transfer learning have 
been used to increase the accuracy. The proposed 
system is capable of detecting improperly worn a 
mask as well as properly worn a mask and no a 
mask. 


Results: The proposed approach outperforms other 
methods with higher accuracy. 


Figure 1. Hypothesis, main objective, method, and result 


2. LITERATURE SURVEY 

Wearing masks is a crucial way to avoid COVID-19 infection [7]. Recent research employs deep 
learning for face mask detection [8]. Many cutting-edge, pre-trained deep learning models, including you 
only look once (YOLO) and faster regions with convolutional neural networks (R-CNN) were utilized for 
transfer learning on new datasets [9]. In addition, It may also use visual geometry group (VGG), residual 
network (ResNet), and deep layer aggregation (DLA) [10]. 

Research by Qin and Li [11] developed a novel face mask identification technique. It divides the use 
of facemasks into three groups. The categories include not donning a face mask, donning one incorrectly, and 
appropriately donning a face mask. The suggested algorithm's face detection accuracy is 98.70%. The best 
model for recognizing face masks was determined by comparing three classifiers; MobileNet, support vector 
machine (SVM) and k-nearest neighbour (k-NN) [12]. The outcomes demonstrated that MobileNet 
outperforms SVM and k-NN in term of accuracy. Research by Militante and Dionisio [13] deep learning has 
been used and it demonstrated its effectiveness in computer vision detection. Deep learning techniques have 
been employed for recognizing faces and face masks. The trained model performs with a 96% accuracy rate 
on the dataset that was collected. The system develops a face mask recognition system connected to the 
raspberry Pi that notifies and it captures facial images if the individual being monitored is not wearing a 
mask. Based on deep learning, Loey et al. [14] introduced a hybrid approach for classifying face masks. To 
check whether or not face masks are present in images, the authors integrated the ResNet-50 feature 
extraction network and applied the transfer learning method. The hybrid model used by the authors to 
evaluate the suggested technique produced a classification accuracy of 99.6%. Reseacrh by 
Nieto-Rodriguez et al. [15] presented a system for determining if the required medical mask is not worn. It 
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aimed to decrease the rate of false positives (FP) of face detections while maintaining the ability to recognize 
masks. The suggested system had a 95% accuracy rate. Research by Sadhukhan and Bhattacharya [16] 
proposed a hybrid face mask detection system. It has used traditional methods, deep learning and handcrafted 
feature extractors. Considering the small amount of data, robust features were extracted using both manual 
and deep learning techniques (CNN, local binary patterns, hue moments, and textural Harlick feature). Then, 
features were chosen using the principal component analysis (PCA). Research by Chen et al. [17] provide a 
mobile phone-based method to detect face masks. They have used face mask micro photos of the gray-level 
co-occurrence matrix (GLCM) to extract several properties. 

The k-NN algorithm has been used in the next step to develop a three-result detection system. The 
system can obtain an accuracy of 82.87% according to validation results. Saravanan et al. [18] suggested a 
system based on the pre-trained deep learning model called Vgg16. The suggested method trains only the 
final layer of the Vgg16 which cuts down on training time and effort. To train and evaluate the proposed 
approach, two datasets are used. During testing with a small dataset, the suggested strategy provides an 
accuracy of 96.50% and 91% with a medium dataset. Research by Meivel et al. [19] proposed a method for 
complex images in the dataset, this paper discusses how to use MATLAB to detect masks. For mask 
detection, the faster R-CNN technique and dataset allotment were specified by MATLAB. In this paper, 
complicated images are managed by a facial recognition system. This system has achieved high results. 
Research by Ieamsaard et al. [20] investigate a face mask identification technique that works well by 
utilizing the "YOLOV5" deep learning model. A comparative model was trained using different epochs 
(there are 20 to 500 epochs in the range.). The deep learning model that performed best in the tests had 300 
epochs and 96.5% accuracy. In order to more effectively deploy face mask identification in the real world, 
especially when monitoring mask dress-up in public areas, research by Yang et al. [21] suggest replacing 
manual face mask detection with a (YOLOV5) method. The experimental findings demonstrate that the 
proposed algorithm in this research can successfully identify face masks in public areas. For mini YOLO v4, 
Kumar et al. [8] suggested YOLO v4 with a revised and enhanced prediction network. By incorporating a 
modified-dense spatial pyramid pooling (SPP) that helps to improve the accurate prediction. They also used 
an activation function (Mish), with additional detection layers and modified anchor boxes to improve the 
mini YOLO v4 backbone architecture. High accuracy was attained with the suggested method. 


3. METHOD 

CNN architectures of the MobileNet model have been described in subsection 3.1. The transfer 
learning has been explained in subsection 3.2. Figure 2 shows the flow diagram of the proposed system. The 
diagram consists of a pre-processing stage using the Haar wavelet approach which is followed by splitting 
data into: training data and testing data. The MobileNetv2 model is then used in the proposed system. 


Training MobileNetV2 
Saving Model 


Testing Model 
Face Mask Detection 


Figure 2. General flow diagram of the proposed system 
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3.1. MobileNetV2 model 

A CNN with 53 layers called MobileNet-v2 [22] is used in this research. The ImageNet database has 
been used to train this model. This network can categorize images into more than a thousand different object 
categories. The network can accept images up to 224 by 224 pixels. MobileNetV2 refers to the MobileNet 
models' second generation. The number of parameters in this version is substantially lower. As a result, deep 
neural networks become thinner. It functions best in embedded and mobile systems because of its 
lightweight. Since MobileNetV2 is more lightweight than MobilenetV1, it can also be used with web 
browsers because browsers have lower storage, computation power and graphic processing capabilities. 

The shortcut connections are located between the thin bottleneck layers in an inverted residual 
structure on which it is based. To retain representational power, it also eliminates nonlinearities in the narrow 
layers. In terms of semantic segmentation, object detection, and classification, MobileNetV2 overcomes the 
latest technologies for mobile visual recognition. It is a major improvement over MobileNetV1. The ability 
of the model to switch between higher-level descriptors, such as image categories, and lower-level 
descriptors, such as pixels, is contained in the inner layer, the intermediary inputs and outputs of the model 
are encoded by the bottlenecks. Finally, shortcuts allow for quicker training and improved accuracy, much 
like with conventional residual connections. Figure 3 shows the MobileNetv2 block diagram. This diagram 
with layers and functions of the MobileNetV2 are explained [6]: 


Cony 1*1 Relu 


Dwise 3*3 Relu Cony 1*1 Linear 


Shortcut 


Detection 


Mobile NetV2 
Conv. 2D Building Block 
Input Image 


Figure 3. Architecture of MobileNetV2 [23] 


3.1.1. Convolutional layer 

Convolutional layers serve as the main building components. The method of using a filter with input 
to create an activation is known as convolution. Employing the same filter continuously to input results is 
used to create a feature map, which displays the locations and degree of a recognized feature in input, such as 
an image. CNNs are innovative in that they have the ability to automatically use a lot of parallel filters that 
are fitted to a training dataset while adhering to a specific predictive modeling problem, such as image 
categorization. The outcome is a set of very precise features that are present throughout the input images. 


3.1.2. Fully-connected layer 

These layers have complete links to the activation layers and are added to the model. The multi-class 
or binary categorization of the provided photos is made possible with the aid of these layers. One activation 
function utilized in these layers is called SoftMax, which gives the likelihood of the result of projected output 
classes. 


3.1.3. Non-linear layer 


An activation layer that is not linear in a neural network is what gives the network its nonlinear 
characteristic. It indicates that MobileNetv2 can successfully approximate functions that are nonlinear or it 
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can correctly predict the subclass of a function that is separated by a nonlinear decision boundary. Typically, 
these layers are placed after the convolutional layers. 


3.1.4. Linear bottlenecks 

It reverses the traditional bottleneck architecture. It significantly increases performance and 
optimizes the model complexity. The inverted residual block has been commonly used in later mobile 
network designs due to its excellent efficiency. The final convolution of a residual block with linear output, 
before it is applied to the start activations, is the linear bottleneck. 


3.2. Transfer learning 

It appears that knowledge transfer between tasks comes naturally to human learners. In other words, 
we recognize and use the relevant information from earlier learning experiences when confronted with new 
situations. The easier it is for us to learn a new task, the more it matches our previous experience. On the 
other hand, traditional machine learning algorithms emphasize distinct tasks. Transfer learning aims to alter 
this by creating strategies to leverage knowledge learned from previous activities to improve learning in a 
target activity. A step to enabling machine learning as efficiently as human learning is the advancement of 
knowledge transfer techniques. Transfer methods are typically straightforward extensions of machine 
learning that were employed to learn the objectives, and thus depend heavily on those algorithms. Some of 
the approaches employed in transfer learning include well-known classification and inference techniques like 
markov logic networks, markov logic networks and neural networks. 

There are three typical ways that transfer might enhance learning. First, as compared to an 
uninformed agent's beginning performance in the target task, the initial performance achieved using simply 
the transferred knowledge is superior. The second issue is the variance between the highest performance that 
can be attained in the target with transfer learning and the performance achieved without it. The third is the 
speedup level of using the transferred information instead of starting from scratch. The system that is 
suggested makes use of deep neural networks. However, training requires a lot of processing time and 
computing resources. To overcome these challenges, transfer learning is used in this case to train the 
network. 

In this research, the output layer must contain only three nodes corresponding to the proper mask 
class, improper mask class and non-mask class. TensorFlow was used to load ImageNet's pre-trained 
weights. To prevent the impairment of previously learned features, the base layers of the MobleNetv2 model 
are then frozen. Then, the other layers are trained with the gathered MAFA dataset to determine the features 
required to identify features needed to distinguish between a face that is correctly wearing a mask, one that is 
not, and one that is not wearing a mask at all. 


4. EXPERIMENTAL RESULTS 

The first step is to train the model with the appropriate dataset to predict face mask status. 
Subsection 4.1 describes the datasets and the pre-processing. The evaluation metrics are described in 
sub section 4.2. Finally, subsection 4.2 shows the results of the proposed system and compression with 
current methods. 


4.1. Dataset 

There was a significant amount of noise in the masked face recognition dataset, and the images had 
a significant amount of repetition. Since a robust dataset impacts how accurate a model would be after being 
training stage, the used dataset is processed in two steps. First, the repeats were manually deleted once they 
had been detected. Second, the inaccurate images that were discovered in the above dataset are also removed 
manually. Different datasets can be used for face mask detection such as the Kaggle dataset for face mask 
detection [24] and the MAFA dataset [25]. The MAFA dataset has been selected to evaluate the proposed 
system because different orientations and levels of occlusion are present in the faces in this dataset. 
Therefore, a practical and robust proposed system must be proposed and implemented to achieve high 
performance with this dataset. MAFA dataset is also a very large dataset and includes different types of face 
masks so it can be utilized to offer a comprehensive baseline of all types of MAFA for face mask detection 
systems. In this dataset, a set of facial photos from the Internet are first collected. During this procedure, 
more than 300,000 photos with faces are retrieved from social networks using keywords like "face," "mask," 
"occlusion," and "cover." They only save pictures that have a side length of at least 80 pixels. Images with 
only faces and no occlusion are then manually eliminated. At least one face is hidden in each of the final 30, 
811 photos that they collect. Different orientations and levels of occlusion are present in the faces in the 
dataset. Six key characteristics of each masked face are listed for each image throughout the annotation 
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process which includes the positions of mask type, face, occlusion degree, face orientation, eyes, and the 
positions of masks. Feature extraction was carried out using the Haar Wavelet approach with a 24x24 
window size utilizing the Haar cascade classifier. Only 10,000 face images from the MAFA dataset have 
been chosen and it has been divided into 8,300 images for training and 1,700 images for testing. 


4.2. The evaluation metrics 

Several performance criteria are employed to assess the proposed system. They are presented 
in (1)-(4). These metrics are based on four variables. First, true positives (TP) are positive tuples that are 
accurately classified as positives by the classifier. Second, true negatives (TN) are negative tuples that are 
successfully classified as negative by the classifier. Third, FP are negative tuples that the classifier 
misinterpreted as positive. Fourth, false negatives (FN) are negative tuples that have been incorrectly 
categorized as negative by the classifier. 


TP 


Precision = —— (1) 
TP+FP 
TP 
Recall = —— (2) 
TP+FN 
TP+TN 
Accuracy = ——————— (3) 
TP+TN+FP+FN 
Precision*Recall 
F1 — score =2«*+——— (4) 
Precision+Recall 


4.3. The results 

We used the Adam optimization method to train our model, setting the learning rate for updating 
network weights to 0.0001, the number of iterations equal to 100 for each epoch, and the batch size equal to 
32. The accuracy, precision, recall and fl-score are 99.4%, 99.4%, 98.6%, and 99.2%, respectively. Table 1 
shows that the suggested system achieves greater accuracy than RetinaFaceMask and comparative results 
with results in [26]. In particular, when compared to RetinaFaceMask, the proposed model provides 6% 
greater precision in mask identification. The recall has increased by 5.1%. When compared to the results of 
[26], the proposed model provides 0.48% greater precision in mask identification. The recall has increased by 
0.36%. Table 2 shows the results of the proposed system using different learning rates. Figures 4-6 show 
samples of various test results for identifying people wearing masks properly, wearing masks improperly and 
no mask exist. They show the accuracy of the prediction of the proposed system on some images from the 
MAFA dataset. The red boxes in Figure 4 represent the outcome of the detection of people who do not put a 
mask. Figure 5 shows the results of the proposed system with four samples for people who wear a face mask 
properly (blue boxes). Finally, the results of the proposed system with four samples of individuals who 
improperly don face masks are shown in Figure 6 (yellow boxes). 

In comparison with the cutting-edge methods, the proposed system has been compared with 
RetinaFaceMask public baseline results and the results of the face mask detection system have been 
presented by the authors in [26]. Precision and recall for face mask identification are used to evaluate 
RetinaFaceMask's effectiveness after training with the MAFA dataset, therefore the performance of the 
proposed system is also measured in the same setting for comparison purposes. We used precision and recall, 
two common criteria, to compare the performance of these systems. 


Table 1. Comparison of the proposed system with current methods 


Model Precision (%) Recall (%) 
RetinaFaceMask based on ResNet 93.4 94.5 
Sethi et al. [26] 98.92 98.24 
Ge et al. [25] 76.40 - 

The proposed system 99.40 98.60 


Table 2. The performance of the proposed system with different learning rate 
Learning rate __ Precision (%) _ Recall (%) 


0.00001 95.7 95.9 
0.00005 96.1 96.5 
0.0001 (best) 99.40 98.60 
0.0005 95.3 96.1 
0.001 94.1 94.9 
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Figure 6. The results of wearing a face mask improperly 


5. CONCLUSION 

COVID-19 is a pandemic that spreads by both directly and indirectly human contact. The proposed 
system is capable of classifying people into three groups based on whether they are wearing face masks. This 
system has used a pre-trained deep learning called MobileNetv2 and image processing to detect face masks. 
Moreover, a highly reliable and cost-effective solution was introduced by applying transfer learning to the 
MobileNetv2 model and considerable experimentation over a large dataset has been implemented. The 
MAFA dataset has been used to train and test the proposed system. Since the MAFA dataset has incorrect 
images, they have been manually deleted to increase the precision of the detection and improve inaccurate 
predictions in the proposed system. The proposed system has achieved very impressive results with an 
accuracy equal to 99%. The proposed method will be tested in real-time circumstances in upcoming work. 
Additionally, a face mask can be utilized in conjunction with the proposed method to identify facial 
landmarks for biometric functions. 
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